Wharton County
GRAM: Global Reasoning for Multi-Page VQA
Blau, Tsachi, Fogel, Sharon, Ronen, Roi, Golts, Alona, Ganz, Roy, Avraham, Elad Ben, Aberdam, Aviad, Tsiper, Shahar, Litman, Ron
The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document-level tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our C-Former model, which reduces the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.
- Europe > Russia (0.14)
- Asia > Russia (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (108 more...)
- Law (1.00)
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (5 more...)
Egypt sets its sights on artificial intelligence
Interest in artificial intelligence is on the rise in Egypt as enterprises embrace emerging technology to expand into new markets, investors back AI startups and government initiatives support education and awareness of the technology. There is mounting evidence that private enterprise is embracing AI. Recently, for example, AI and anlytics vendor fonYou partnered with a mobile operator in Egypt to use its AI module to reach the unbanked, and Widebot just raised a six-figure (USD) Pre-Series A investment for its Arabic language chatbot. Meanwhile, the government is looking to develop AI capabilities in a number of ways, including launching its first AI faculty at Kafr El Sheikh University. Egypt is aiming to have 7.7 percent of its GDP derived through AI by 2030, a figure touted in the PricewaterhouseCoopers (PwC) report, The Potential Impact of AI in the Middle East.
- Europe > Middle East (0.25)
- Africa > Middle East > Egypt > Kafr El Sheikh Governorate > Kafr El Sheikh (0.25)
- North America > United States > Texas > Wharton County (0.05)
- (5 more...)
- Professional Services (0.70)
- Government (0.60)